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Abstract 

In this work, we propose a game theoretic framework to analyze the behavior of 
cognitive radios for distributed adaptive channel allocation. We define two different 
objective functions for the spectrum sharing games, which capture the utility of selfish 
users and cooperative users, respectively. Based on the utility definition for cooperative 
users, we show that the channel allocation problem can be formulated as a potential 
game, and thus converges to a deterministic channel allocation Nash equilibrium point. 
Alternatively, a no-regret learning implementation is proposed for both scenarios and 
it is shown to have similar performance with the potential game when cooperation is 
enforced, but with a higher variability across users. The no-regret learning formulation 
is particularly useful to accommodate selfish users. Non-cooperative learning games 
have the advantage of a very low overhead for information exchange in the network. 

We show that cooperation based spectrum sharing etiquette improves the overall 
network performance at the expense of an increased overhead required for information 
exchange. 
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1 Introduction 



With the new paradigm shift in the FCC's spectrum management pohcy |2] that creates 
opportunities for new, more aggressive, spectrum reuse, cognitive radio technology lays the 
foundation for the deployment of smart flexible networks that cooperatively adapt to increase 
the overall network performance. The cognitive radio terminology was coined by Mitola P] , 
and refers to a smart radio which has the ability to sense the external environment, learn from 
the history, and make intelligent decisions to adjust its transmission parameters according 
to the current state of the environment. 

The potential contributions of cognitive radios to spectrum sharing and an initial frame- 
work for formal radio etiquette have been discussed in ^j. According to the proposed eti- 
quette, the users should listen to the environment, determine the radio temperature of the 
channels and estimate their interference contributions on their neighbors. Based on these 
measurements, the users should react by changing their transmission parameters if some 
other users may need to use the channel. 

While it is clear that this etiquette promotes cooperation between cognitive radios, the 
behavior of networks of cognitive radios running distributed resource allocation algorithms 
is less well understood. 

As the cognitive radios are essentially autonomous agents that are learning their envi- 
ronment and are optimizing their performance by modifying their transmission parameters, 
their interactions can be modeled using a game theoretic framework. In this framework, 
the cognitive radios are the players and their actions are the selection of new transmission 
parameters and new transmission frequencies, etc., which influence their own performance, 
as well as the performance of the neighboring players. 

Game theory has been extensively applied in microeconomics, and only more recently 
has received attention as a useful tool to design and analyze distributed resource allocation 
algorithms (e.g. [Z|-[H])- Some game theoretic models for cognitive radio networks were 
presented in PJ, which has identified potential game formulations for power control, call 
admission control and interference avoidance in cognitive radio networks. The convergence 

2 



conditions for various game models in cognitive radio networks are investigated in PH] . 

In this work, we propose a game theoretic formulation of the adaptive channel allocation 
problem for cognitive radios. Our current work assumes that the radios can measure the 
local interference temperature on different frequencies and can adjust by optimizing the 
information transmission rate for a given channel quality (using adaptive channel coding) 
and by possibly switching to a different frequency channel. The cognitive radios' decisions are 
based on their perceived utility asociated with each possible action. We propose two different 
utility definitions, which reflect the amount of cooperation enforced by the spectrum sharing 
etiquette. We then design adaptation protocols based on both a potential game formulation, 
as well as no-regret learning algorithms. We study the convergence properties of the proposed 
adaptation algorithms, as well as the tradeoffs involved. 

2 System Model 

The cognitive radio network we consider consists of a set of transmitting-receiving pairs of 
nodes, uniformly distributed in a square region of dimension D* x D* . We assume that the 
nodes are either fixed, or are moving slowly (slower than the convergence time for the pro- 
posed algorithms). Fig. ^ shows an example of a network realization, where we used dashed 
lines to connect the transmitting node to its intended receiving node. The nodes measure 
the spectrum availability and decide on the transmission channel. We assume that there are 
K frequency channels available for transmission, with K < N . By distributively selecting 
a transmitting frequency, the radios effectively construct a channel reuse distribution map 
with reduced co-channel interference. 

The transmission link quality can be characterized by a required Bit Error Rate target 
(BER) , which is specific for the given application. An equivalent SIR target requirement can 
be determined, based on the modulation type and the amonunt of channel coding. 

The Signal-to-Interference Ratio (SIR) measured at the receiver j associated with trans- 
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mitter i can be expressed as: 



SI Rjj 




(1) 



where Pi is the transmission power at transmitter i, Gij is the hnk gain between transmitter 
i and receiver j. is the interference function characterizing the interference created by 

node i to node j and is defined as 



Analyzing we see that in order to maintain a certain BER constraint the nodes can 
adjust at both the physical and the network layer level. At the network level, the nodes can 
minimize the interference by appropriately selecting the transmission channel frequency. At 
the physical layer, power control can reduce interference and, for a feasible system, results 
in all users meeting their SIR constraints. Alternatively, the target SIR requirements can 
be changed (reduced or increased) by using different modulation levels and various channel 
coding rates. As an example of adaptation at the physical layer, we have assumed that for 
a fixed transmission power level, software defined radios enable the nodes to adjust their 
transmission rates and consequently the required SIR targets by varying the amount of 
channel coding for a data packet. 

For our simulations we have assumed that all users have packets to transmit at all times 
(worst case scenario). Multiple users are allowed to transmit at the same time over a shared 
channel. We assume that users in the network are identical, which means they have an 
identical action set and identical utility functions associated with the possible actions. 

The BER requirement selected for simulations is 10~^, and we assume the use of a Reed- 
MuUer channel code RM (l,m). In tabled we show the coding rate combinations and the 
corresponding SIR target requirements used for our simulations [TTj . 




1 if transmitters i and j are transmitting 



over the same channel 



(2) 



otherwise 
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3 A Game Theoretic Framework 



Game theory represents a set of mathematical tools developed for the purpose of analyzing 
the interactions in decision processes. Particularly, we can model our channel allocation 
problem as the outcome of a game, in which the players are the cognitive radios, their actions 
(strategies), are the choice of a transmitting channel and their preferences are associated with 
the quality of the channels. The quality of channels is determined by the cognitive radios 
by measurements on different radio frequencies. 

We model our channel allocation problem as a normal form game, which can be mathe- 
matically defined as F = {A^, {Si}i^N, {Ui}ieN}, where N is the finite set of players (decision 
makers), and Si is the set of strategies associated with player i. We define § = xSi,i & N as 
the strategy space, and Uf. § ^ M as the set of utility functions that the players associate 
with their strategies. For every player i in game F, the utility function, Ui, is a function of 
Si, the strategy selected by player i, and of the current strategy profile of its opponents: 

In analyzing the outcome of the game, as the players make decisions independently and 
are influenced by the other players' decisions, we are interested to determine if there exist 
a convergence point for the adaptive channel selection algorithm, from which no player 
would deviate anymore, i.e. a Nash equilibrium (NE). A strategy profile for the players, 
•S* = [si, S2, sn], is a NE if and only if 



If the equilibrium strategy profile in Q is deterministic, a pure strategy Nash equilibrium 
exists. For finite games, even if a pure strategy Nash equilibrium does not exist, a mixed 
strategy Nash equilibrium can be found (equilibrium is characterized by a set of probabilities 
assigned to the pure strategies). 

As becomes apparent from the above discussion, the performance of the adaptation al- 
gorithm depends significantly on the choice of the utility function which characterizes the 
preference of a user for a particular channel. The choice of a utility function is not unique. 
It must be selected to have physical meaning for the particular application, and also to 




(3) 
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have appeahng mathematical properties that will guarantee equilibrium convergence for the 
adaptation algorithm. We have studied and proposed two different utility functions, that 
capture the channel quality, as well as the level of cooperation and fairness in sharing the 
network resources. 

3.1 Utility Functions 

The first utihty function (Ul) we propose accounts for the case of a "selfish" user, which 
values a channel based on the level of interference perceived on that particular channel: 



For the above definition, we denoted P—\pi,P2,---,Pn] as the transmission powers for the 
N radios, S=[si,S2v)SAr] as the strategy profile and f{si,Sj) as an interference function: 



This choice of the utility function requires a minimal amount of information for the 
adaptation algorithm, namely the interference measurement of a particular user on different 
channels. 

The second utility function we propose accounts for the interference seen by a user on 
a particular channel, as well as for the interference this particular choice will create to 
neighboring nodes. Mathematically we can define U2 as: 



TV 




(4) 



Vi = 1,2,..., AT 




U2i{si,s-i) 



N 



N 





Mi = 1,2,...,A^ 
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The complexity of the algorithm implementation will increase for this particular case, as 
the algorithm will require probing packets on a common access channel for measuring and 
estimating the interference a user will create to neighboring radios. 

The above defined utility functions, characterize a user's level of cooperation and support 
a selfish and a cooperative spectrum sharing etiquette, respectively. 

3.2 A Potential Game Formulation 

In the previous section we have discussed the choice of the utility functions based on the 
physical meaning criterion. However, in order to have good convergence properties for the 
adaptation algorithm we need to impose some mathematical properties on these functions. 
There are certain classes of games that have been shown to converge to a Nash equilibrium 
when a best response adaptive strategy is employed. In what follows, we show that for the 
U2 utility function, we can formulate an exact potential game, which converges to a pure 
strategy Nash equilibrium solution. 

Characteristic for a potential game is the existence of a potential function that exactly 
reflects any unilateral change in the utility function of any player. The potential function 
models the information associated with the improvement paths of a game instead of the 
exact utility of the game jl2j . 

An exact potential function is defined as a function 

P : § ^ M, if for all i, and Si, G Si, 

with the property that 

Ui{si, - Ui{s[, s-i) = P{si, s-i) - P{Si, s_i). (6) 

If a potential function can be defined for a game, the game is an exact potential game. 
In an exact potential game, for a change in actions of a single player the change in the 
potential function is equal to the value of the improvement deviation. Any potential game 
in which players take actions sequentially converges to a pure strategy Nash equilibrium that 
maximizes the potential function. 
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For our previously formulated channel allocation game with utility function f/2, we can 
define an exact potential function to be 



The function in ((Tj) essentially reflects the network utility. It can be seen thus that the 
potential game property (jH)) ensures that an increase in individual users' utilities contributes 
to the increase of the overall network utility. We note that this property holds only if users 
take actions sequentially, following a best response strategy. 

The proof that equation© is an exact potential function is given in the Appendix. 

Consequently, to ensure convergence for the spectrum allocation game, either a central- 
ized or a distributed scheduler should be deployed. In an ad hoc network, the latter solution 
is preferable. To this end, we propose a random access for decision making in which each user 
is successful with probability pa = 1/N. More specifically, at the begining of each time slot, 
each user flips a coin with probability pa, and, if successful, makes a new decision based on 
the current values for the utility functions for each channel; otherwise takes no new action. 
We note that the number of users that attempt to share each channel, can be determined 
from channel listening as we will detail shortly. The proposed random access ensures that on 
average exactly one user makes decisions at a time, but of course has a nonzero probability to 
have two or more users taking actions simultaneously. We have determined experimentally 
that the convergence of the game is robust to this phenomenon: when two or more users 
simultaneously choose channels, the potential function may temporarily decrease (decreasing 
the overall network performance) but then the upward monotonic trend is re-established. 

The proposed potential game formulation requires that users should be able to evaluate 
the candidate channels' utility function f/2. To provide all the information necessary to 
determine f/2, we propose a signaling protocol based on a three way handshake protocol. The 



Pot{S) = Pot{s^, 





1,2,..., AT. 
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signaling protocol is somewhat similar to the RTS-CTS packet exchange for the IEEE 802.11 
protocol, but intended as a call admission reservation protocol, rather than packet access 
reservation protocol. When a user needs to make a decision on selecting the best transmission 
frequency (a new call is initiated or terminated, and user is successful in the Bernoulli 
trial), such a handshaking is initiated. In contrast to the RTS-CTS reservation mechanism, 
the signaling packets, START, START_CH, ACK_START_CH (END, ACK_END) in our 
protocol, are not used for deferring transmission for the colliding users, but rather to measure 
the interference components of the utility functions for different frequencies and to assist in 
computing the utility function. The signaling packets have a double role: to announce 
the action of the current user to select a particular channel for transmission, and to serve 
as probing packets for interference measurements on the selected channel. The signaling 
packets are transmitted with a fixed transmission power on a common control channel. To 
simplify the analysis, we assume that no collisions occur on the common control channel. As 
we mentioned before, the convergence of the adaptation algorithm was experimentally shown 
to be robust to collision situations. For a better frequency planning, it is desirable to use a 
higher transmission power for the signaling packets than for the transmitted packets. This 
will permit the users to learn the potential interferers over a larger area. For our simulations, 
we have selected the ratio of transmitted powers between signaling and data packets to be 
equal to 2. 

We note that the U2 utility function has two parts: a) a measure of the interference 
created by others on the desired user I^; b) a measure of the interference created by the user 
on its neighbors' transmissions lo- The first part of U2 can be estimated at the receiving 
node, while the second part can only be estimated at the transmitter node. Therefore, the 
protocol requires that both transmitter and receiver listen to the control channel, and each 
maintain an information table on all frequencies, similar to the NAV table in 802.11. In 
what follows, we outline the steps of the protocol. 

Protocol steps: 
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1. Bernoulli trial with Pa 

if 0, listen to the common control channel; break. 
if 1, go to 2) 

2. Transmitter sends START packet: includes current estimates for the interference cre- 
ated to neighboring users on all possible frequencies, Io{f) (this information is com- 
puted based on information saved in the Channel Status Table); 

3. Receiver computes current interference estimate for the user Id{f), determines U2{f) — 
h{f) + Io{f) for all channels, and decides on the channel with the highest U2 (in case of 
equality, the selection is randomized, with equal probability of selecting the channels); 

4. Receiver includes the newly selected channel information on a signahng packet STARTjCH 
which is transmitted on the common channel; 

5. Transmitter sends ACK_START_CH which acknowledges the decision of transmitting 
on the newly selected frequency, and starts transmitting on the newly selected channel; 

6. All the other users (transmitters and receivers) that heard the STARTjCH and 
ACK_START_CH packets update their Channel Status Tables (CST) accordingly. 

We note that when a call ends, only a two-way handshake is required: END, ACK_END 
to announce the release of the channel for that particular user. Upon hearing these end-of-call 
signaling packets, all transmitters and receivers, update their CSTs accordingly. 

We can see that a different copy of the CST should be kept at both the transmitter 
and the receivers (CST_t and CST_r, respectively). The entries of each table will contain 
the neighboring users that have requested a channel, the channel frequency, and the esti- 
mated link gain to the transmitter/receiver of that particular user (for CST_r and CST_t, 
respectively) . 

The proposed potential game framework has the advantage that an equilibrium is reached 
very fast following a best response dynamic, but requires substantial information on the 
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interference created to other users and additional coordination for sequential updates. We 
note however, that the sequential updates procedure also resolves the potential conflicts on 
accessing the common control channel. 

The potential game formulation is suitable for designing a cooperative spectrum sharing 
ettiquette, but cannot be used to analyze scenarios involving selfish users, or scenarios in- 
volving heterogeneous users (with various utility functions corresponding to different QoS 
requirements). In the following section, we present a more general design approach, based 
on no-regret learning techniques, which alleviates the above mentioned problems. 

3.3 $-No-Regret Learning for Dynamic Channel Allocation 

While we showed in the previous section that the game with the U2 utility function fits 
the framework of an exact potential game, the Ul function lacks the necessary symmetry 
properties that will ensure the existence of a potential function. In order to analyze the 
behavior of the selfish users game, we resort to the implementation of adaptation protocols 
using regret minimization learning algorithms. No regret learning algorithms are probabilis- 
tic learning strategies that specify that players explore the space of actions by playing all 
actions with some non-zero probability, and exploit successful strategies by increasing their 
selection probability. While traditionally, these types of learning algorithms have been char- 
acterized using a regret measure (e.g. external regret is defined as the difference between 
the payoffs achieved by the strategies prescribed by the given algorithm, and the payoffs 
obtained by playing any other fixed sequence of decisions in the worst case), more recently, 
their performance have been related to game theoretic equilibria. 

A general class of no-regret learning algorithms called <l>-no-regret learning algorithm are 
shown in jT^j to relate to a class of equilibria named ^-equilibria. No-external-regret and 
no-internal regret learning algorithms are specific cases of $-no-regret learning algorithm. 
$ describes the set of strategies to which the play of a learning algorithm is compared. A 
learning algorithm is said to be $-no-regret if and only if no regret is experienced for playing 
as the algorithm prescribes, instead of playing according to any of the transformations of the 
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algorithm's play prescribed by elements of $. It is shown in that the empirical distri- 
bution of play of $-no-regret algorithms converges to a set of ^-equilibria. It is also shown 
that no-regret learning algorithms have the potential to learn mixed strategy (probabilistic) 
equilibria. We note that Nash equilibrium is not a necessary outcome of any $-no regret 
learning algorithm [T3] . 

We propose an alternate solution for our spectrum sharing problem, based on a no- 
external-regret learning algorithm with exponential updates, proposed in [TKj . 

Let Uf{si) denote the cumulative utility obtained by user i through time t by choosing 
strategy Sj: U-{si) = ^lt=iUi{si, S'!!^). For j3 > 0, the weight (probability) assigned to 
strategy Si at time t + 1, is given by: 



In based on simulation results, it is shown that the above learning algorithm con- 
verges to Nash equilibrium in games for which pure strategy Nash equilibrium exists. We also 
show by simulations that the proposed channel allocation no-regret algorithm converges to 
a pure strategy Nash equilibrium for cooperative users (utility f/2), and to a mixed strategy 
equilibrium for selfish users (utility Ul). 

By following our proposed learning adaptation process, the users learn how to choose the 
frequency channels to maximize their rewards through repeated play of the game. 

For the case of selfish users, the amount of information required by this spectrum sharing 
algorithm is minimal: users need to measure the interference temperature at their intended 
receivers (function Ul) and to update their weights for channel selection accordingly, to 
favor the channel with minimum interference temperature (equal transmitted powers are 
assumed). We note that the no- regret algorithm in (jS)) requires that the weights are updated 
for all possible strategies, including the ones that were not currently played. The reward 
obtained if other actions were played can be easily estimated by measuring the interference 
temperature for all channels. 

For the case of cooperative users, the information needed to compute U2 is similar to 
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the case of potential game formulation. We note that, while the learning algorithm does 
not require sequential updates to converge to an equilibrium, the amount of information 
exchange on the common control channel requires coordination to avoid collisions. One 
possible approach to reduce the amount of signaling, would be to maintain the access scheme 
proposed in the previous section, which would ensure that on average only one user at the 
time will signal changes in channel allocation. 

4 Simulation Results 

In this section, we present some numerical results to illustrate the performance of the pro- 
posed channel allocation algorithms for both cooperative and selfish users' scenarios. For 
simulation purposes, we consider a fixed wireless ad hoc network (as described in the system 
model section) with = 30 and D = 200 (30 transmitters and their receivers are randomly 
distributed over a 200m x 200m square area). The adaptation algorithms are illustrated for 
a network of 30 transmitting radios, sharing K = 4 available channels. A random channel 
assignment is selected as the initial assignment and for a fair comparison, all the simulations 
start from the same initial channel allocation. 

We first illustrate the convergence properties of the proposed spectrum sharing algo- 
rithms. We can see that for cooperative games, both the potential game formulation, as well 
as the learning solution converge to a pure strategy Nash equilibrium (Figures 121 HI [101 and 
fTT|) . In Figure El we illustrate the changes in the potential function as the potential game 
evolves, and it can be seen that indeed by distributively improving their utility, the users 
positively affect the overall utility of the network, which is approximated by the potential 
function. 

By contrast, the selfish users' learning strategy converges to a mixed strategy equilibrium, 
as it can be seen in Figures IT^ and IT^ 

As performance measures for the proposed algorithms we consider the achieved SIRs and 
throughputs (adaptive coding is used to ensure a certain BER target, as previously explained 
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in Section II) . We consider the average performance per user as well as the variability in the 
achieved performance (fairness), measured in terms of variance and CDF. 
We first give results for the potential game based algorithm. 

The choice of the utility function for this game enforces a certain degree of fairness in 
distributing the network resources, as it can be seen in figures El El and |Hl Figures El and 
El illustrate the SIR achieved by the users on each of the 4 different channels for initial and 
final assignments, respectively. An SIR improvement for the users that initially had a low 
performance can be noticed, at the expense of a slight penalty in performance for users with 
initially high SIR. It can be seen in Figure 13 that at the Nash equilibrium point, the number 
of users having an SIR below dB has been reduced. Furthermore, figure |H1 shows that the 
percentage of the users who have an SIR below 5 dB decreases from 60% to about 24%, at 
the expense of a slight SIR decrease for users with an SIR greater than 12.5 dB. 

The advantage of the potential game is illustrated in figure El in terms of the normalized 
achievable throughput at each receiver. For the initial channel assignment, 62% of the 
users have a throughput less than 0.75. At the equilibrium, this fraction is reduced to 
38%. Aggregate normalized throughput improvements for the potential game formulation 
are illustrated in Table El 

Our simulation results show very similar performance for the learning algorithm in co- 
operative scenarios, with the potential game formulation. Figures 171 and IT^ show the initial 
and final assignment for this algorithm, as well as the achieved SIRs after convergence for all 
users in the network. In terms of fairness, the learning algorithm performs slighly worse than 
the potential game formulation (Figure El) • However, even though the equilibrium point for 
learning is different than that of the potential game, the two algorithms achieve very close 
throughput performance (Table E}- 

As we previously mentioned, the learning algorithm for selfish users does not lead to a 
pure strategy Nash equilibrium channel allocation. In FigurelT!?lwe illustrate the convergence 
properties for an arbitrarily chosen user, which converges to a mixed strategy allocation: 
selects channel 1 with probability 0.575 or channel 3 with probability 0.425. The evolutions 
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of the weights for all the users in the network are shown in Figure El 

We compare the performance of the proposed algorithms for both cooperative and non- 
cooperative scenarios. The performance measures considered are the average SIR, average 
throughput per user, and total average throughput for the network. At the beginning of each 
time slot, every user will either choose the same equilibrium channel for transmission (in 
cooperative games with pure strategy Nash equilibrium solutions), or will choose a channel 
to transmit with some probability given by the mixed strategy equilibrium (i.e. learning 
using Ul). In the random channel allocation scheme, every user chooses a channel with 
equal probability from a pool of four channels. 

Figure El shows the CDF of Time Average SIR in different games. All learning games 
and the potential game outperform the random channel allocation scheme. The potential 
game has the best throughput performance, followed closely by the cooperative learning 
scheme. It can be seen in Figure El that half of the users have an average throughput below 
0.3 in the random allocation scheme. The percentage of users whose average throughput is 
below 0.3 is 23% in potential game, 27% for learning using U2 and 34% for learning using 
Ul, while the fraction is 51% for the random selection. 

In Figure El we summarize the performance comparisons among the proposed schemes: 
total average throughput, average throughput per user, and variance of the throughput per 
user. The variance performance measure quantifies the fairness, with the fairest scheme 
achieving the lowest variance. Among all the proposed schemes, potential channel allocation 
game has the best performance. It is interesting to note that in terms of average obtained 
throughput per user, the three schemes perform very similar, but differ in the performance 
variability across users. It seems that even when cooperation is enforced by appropriately 
defining the utility, the potential game formulation provides a fairness advantage over the 
no-regret learning scheme. 
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5 Conclusion 



In this work, we have investigated the design of channel sharing etiquette for cognitive radio 
networks for both cooperative and non-cooperative scenarios. Two different formulations 
for the channel allocation game were proposed: potential game formulation, and no-regret 
learning. We showed that all the proposed spectrum sharing policies converge to a chan- 
nel allocation equilibrium, although a pure strategy allocation can be achieved only for 
cooperative scenarios. Our simulation results have showed that the average performance in 
terms of SIR or achievable throughput is very similar for both learning and potential game 
formulation, even for the case of selfish users. However, in terms of fairness, we showed 
that both cooperation and allocation strategy play an important role. While the proposed 
potential game formulation yields the best performance, its applicability is limited to co- 
operative environments and significant knowledge about neighboring users is required for 
the implementation. By contrast, the proposed no-regret learning algorithm is suitable for 
non-cooperative scenarios and requires only a minimal amount of information exchange. 

6 Appendix 

Proof: Suppose there is a potential function of game F : 

N / N N \ 

Pot'{S) = I -a Pj^ijfi^j^ ^i) PiGjifisi, Sj) j (9) 

where < a < 1. Then for all i e {1, 2, N}, 

N / N N 

Pot'{si, s-i) ^^l-a Y PjGijf{sj, Si) - (1 - a) ^ PiGjif{si, sj) 

N N 

= -a ^ PjGijf{sj,Si)-{l-a) ^ piGjif{si, Sj) 



N 

+ E 

k^i,k=l 



N N 
XI PjGkjf{Sj:Sk) - {1- a) X PkGjkf{Sk:Sj) 
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N N 

= -a ^ PjGijf{sj,Si) - (1 - a) piGjif{si, sj) 

N r N 
+ ^ -apiGkifisi, Sk) - a ^ pjGkjf{sj, Sk) 

k^i,k=l L j^k,j^ij=l 

N 

-(1 - a)pkGikf{sk, Si) - (1 - a) PkGjkf{sk, sj) 



N 



N 



= -a ^ PjG'ij/(sj,Sj) - (1 - a) ^ PiGjif{si, sj) 

AT AT 
k^i,k=l k^i,k=l 



N 



N 



N 



+ X Yl PjGkjf(sj,Sk) - (1- a) X PkGjkf(sk,Sj) 

kj^i,k=l \ j^k,jj^i,j=l j^k,jj^i,j=l 

N N 

= -a X PjGijf{sj,Si)-{l-a) X PiGjifisi,Sj) 

N N 

-a X PiGkif{sh Sk) - (1 - «) X PkGikf{sk, Si) 

k^i,k=\ k^i,k=l 



Let 



AT 



iV 



+ X X Pi<^ikj/(Sj'Sik) - (1 - «) X PkGjkf{Sk,Sj) 

k^i,k=l \ j^k,j^ij=l jjtk,jT^ij=l 



N 



N 



N 



Then, 



Qi^-i) = X ( "'^ X PjGkjf{Sj,Sk) - (l-a) X PfeG'jfe/(Sfe, Sj) j , 
k^ik=l \ j^kj^ij=l j^kjy^ij=l 

N N 

Pot'{si,s^i) ^ -a X PjGijf{sj,Si) - (l-a) ^ Pi^jifisi, Sj) 

N N 

-a X PiGkifisi, Sk) - (1 - a) ^ PkGikf{sk, Si) + Q{s-i) 

kj^i,k=l k^i,k=l 
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N N 

= -(a + (l-a)) ^ pjGijf{sj,Si)-{a+{l-a)) ^ piGjif{si, sj) + Q{s-i) 
If user i changes its strategy from Si to s^, we can get: 

N N 
N N 

-a ^ piGkif{s'i,Sk)-il-a) ^ PkGikf{sk, s[) + Q{s^i) 

N N 

= -(a + (l-a)) pjGijf{sj,s'i) - {a+ {1 - a)) ^ piGjif{s[, sj) + Q{s^i) 

Here (5(s„j) is not affected by the strategy changing of user i. Hence, 

N 

Pot'{s[, - Pot'{si, s_i) = -(a + (1 - a)) ^ pjGijf{sj, s-) 

N 

-(a + (l-a)) ^ piGjif{s[,Sj) 

N N 

-(a + (l-a)) ^ - (a+ (1 - a)) ^ piGjif{si,Sj] 



AT JV / N N 

- Pi'^ijfi^j^^'i)- Y Pi'^jifi^i^^i)-[- Y Pi'^ijfi^j^^i) - Y Pi'^jifi^ 

From equation 

N N 

Ui{s[,s-i) -Ui{si,s-i) = - Y PjGijf{sj,s'i) - Y PiGjif{s'i,Sj) 

(N ^ \ 

- Y Pj^ijf(^j^^i) - Y PiGjif (si, Sj) = 1,2,..., N, 



Ui{s[, s^i) - Ui{si, s^i) = Pot'{s[, s.i) - Pot'isi, s_i)\fi = 1, 2, N, 



So, Pot'{S) in Q is an exact potential function of game T. If we set a to | in Q, Pot'{S) 
is the same as Pot{S) defined in ((Tj), and we prove that ((Tj) is an exact potential function of 
game F. 
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Table 1: Code rates of Reed-MuUer code RM (1, m) and corresponding SIR requirement for 
target BER^IQ-^ 



m 




kjiri, ^^Qij J 


2 


0.75 


6 


3 


0.5 


5.15 


4 


0.3125 


4.6 


5 


0.1875 


4.1 


6 


0.1094 


3.75 


7 


0.0625 


3.45 


8 


0.0352 


3.2 


9 


0.0195 


3.1 


10 


0.0107 


2.8 



Table 2: SIR and normalized throughput of all users at initial and final channel assignment 





Total Normahzed Throughput 


Initial 
Final (Potential Game) 
Final (Learning U2) 


9.4 

16.5 

15.3 
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Figure 1: A snapshot of the nodes' positions and network topology 



The convergence of the strategies (30 nodes) 
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Figure 2: Potential game: convergence of users' strategies 
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The potential function of the game 



T: Number of Trials 



Figure 3: Evolution of potential function 
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^ure 4: Potential game: strategy evolution for selected arbitrary users 
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The initial SIRs within channel 1 



The initial SIRs within channel 2 



1 23456789 10 



The initial SIRs within channel 3 
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The initial SIRs within channel 4 



Figure 5: SIRs for initial channel assignment channels 



The final SIRs within channel 1 
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The final SIRs within channel 3 
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The final SIRs within channel 4 
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Figure 6: Potential Game: SIRs at final channel assignment 
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The Histogram of the SIRs over all users 
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Figure 7: SIRs histogram. Initial Channel Assignment vs. Final Channel Assignment 



CDF of SIRs over the nodes in Potential Game 
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Figure 8: CDF for the achieved SIRs. Initial Channel Assignment vs. Final Channel As- 
signment 
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CDF of Throughputs over the nodes In Potential Game 



Initial Assignment 






— Final Assignment ( Potential Game) 






-• - Final Assignment ( Learning U2) 
















. , i . 













,i , , , , , , , 1 

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 



Value of Throughput 

Figure 9: CDF for the achieved throughputs. Initial Channel Assignment vs. Final Channel 
Assignment 



The action distribution of One Node: Model 4 



Action 1 
Action 2 
Action 3 
Action 4 
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Figure 10: No- regret learning for cooperative users: weights distribution evolution for an 
arbitrary user 
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The action distribution 
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Figure 11: No-regret learning for cooperative users: weights distribution evolution for all 
users 



The final SIRs within channel 1 



The final SIRs within channel 2 
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Figure 12: No-regret learning for cooperative users: SIR of users in different channels at 
Nash equilibrium 
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The action distribution of One Node: Node14 
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Figure 13: No-regret learning for selfish users: weights evolution for an arbitrary user 




Figure 14: No-regret learning for selfish users: Evolution of weights for all users 
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The CDF of Average SIR in different game 
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Figure 15: The CDF of Time Average SIRs 



The CDF of Average Throuhgput in different game 
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Figure 16: The CDF of Average Throughput 
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